Back

Cell Systems

Elsevier BV

Preprints posted in the last 7 days, ranked by how well they match Cell Systems's content profile, based on 167 papers previously published here. The average preprint has a 0.54% match score for this journal, so anything above that is already an above-average fit.

1
Heterogeneous, Population-Level Drug-Tolerant Persisters Exhibit Ion-Channel Remodeling and Ferroptosis Susceptibility

Hayford, C. E.; Baleami, B.; Stauffer, P. E.; Paudel, B. B.; Al'Khafaji, A.; Brock, A.; Quaranta, V.; Tyson, D. R.; Harris, L. A.

2026-04-13 systems biology 10.1101/2022.02.03.479045 medRxiv
Top 0.6%
14.6%
Show abstract

Drug-tolerant persisters (DTPs) represent a major obstacle to durable responses in targeted cancer therapy. DTPs are commonly described as distinct single-cell states that survive drug treatment via reversible, non-genetic mechanisms and drive tumor recurrence. Recent work demonstrates that multiple DTPs can coexist, reflecting diversity in lineage, signaling programs, or stress responses. However, each DTP is still generally viewed as a uniform cellular phenotype. Building on our prior work describing a population-level DTP termed "idling" [Paudel et al., Biophys. J. (2018) 114, 1499-1511], here we present evidence supporting a fundamentally different view: that DTPs are not single-cell states, but rather heterogeneous populations composed of multiple sub-states with distinct division and death rates that balance to produce near-zero net population growth. Using single-cell transcriptomics and lineage barcoding, we identify multiple phenotypic states within idling DTP populations, with reduced heterogeneity compared to untreated populations, and find that idling DTP cells emerge from nearly all lineages. Transcriptomic and functional analyses further reveal altered ion-channel activity in idling DTPs, which we confirm experimentally. Moreover, drug-response assays reveal increased susceptibility of idling DTPs to ferroptosis, a non-apoptotic form of regulated cell death, indicating the emergence of vulnerabilities associated with drug tolerance. Altogether, our results support a population-level view of tumor drug tolerance in which DTPs comprise stable collections of phenotypic states, shaped by treatment-defined phenotypic landscapes, which are potentially vulnerable to subsequent interventions. This perspective implies that eradicating DTPs will require a fundamental shift away from cell-type-centric strategies toward sequential treatments that progressively reduce phenotypic heterogeneity by modulating the molecular and cellular processes that establish the DTP landscape, an approach previously termed "targeted landscaping."

2
GRASP: Gene-relation adaptive soft prompt for scalable and generalizable gene network inference with large language models

Feng, Y.; Deng, K.; Guan, Y.

2026-04-14 bioinformatics 10.1101/2025.10.20.683485 medRxiv
Top 1.0%
10.1%
Show abstract

Gene networks (GNs) encode diverse molecular relationships and are central to interpreting cellular function and disease. The heterogeneity of interaction types has led to computational methods specialized for particular network contexts. Large language models (LLMs) offer a unified, language-based formulation of GN inference by leveraging biological knowledge from large-scale text corpora, yet their effectiveness remains sensitive to prompt design. Here, we introduce Gene-Relation Adaptive Soft Prompt (GRASP), a parameter-efficient and trainable framework that conditions inference on each gene pair through only three virtual tokens. Using factorized gene-specific and relation-aware components, GRASP learns to map each pair's biological context into compact soft prompts that combine pair-specific signals with shared interaction patterns. Across diverse GN inference tasks, GRASP consistently outperforms alternative prompting strategies. It also shows a stronger ability to recover unannotated interactions from synthetic negative sets, suggesting its capacity to identify biologically meaningful relationships beyond existing databases. Together, these results establish GRASP as a scalable and generalizable prompting framework for LLM-based GN inference.

3
Vector2Variant: Discovery of Genetic Associations from ML Derived Representations without Phenotype Engineering

Sooknah, M.; Srinivasan, R.; Sankarapandian, S.; Chen, Z.; Xu, J.

2026-04-17 genetic and genomic medicine 10.64898/2026.04.10.26350624 medRxiv
Top 2%
6.1%
Show abstract

Genome-wide association studies (GWAS) have transformed our understanding of human biology, but are constrained by the need for predefined phenotypes. We introduce Vector2Variant (V2V), a general-purpose framework that transforms any set of high-dimensional measurements (such as machine learning embeddings) into a genome-wide scan for associations, without requiring rigid specification of a phenotype. Rather than testing genetic variants against single traits, V2V finds the axis in multivariate space along which carriers and non-carriers maximally differ, and produces a continuous "projection phenotype" that can be interpreted by association with disease labels. The projection phenotypes correlate with orthogonal clinical biomarkers never seen during training, suggesting the learned axes capture biologically meaningful variation. We applied V2V to imaging, timeseries, and omics modalities in the UK Biobank and recovered established biology (like the role of CASP9 in renal failure) without the need for targeted measurements, alongside novel associations including a frameshift variant in LRRIQ1 (potentially protective for cardiovascular disease). V2V is computationally efficient at genome-wide scale, producing summary statistics and disease associations that facilitate target prioritization without the need for phenotype engineering.

4
Efficient generation of epitope-targeted de novo antibodies with Germinal

Mille-Fragoso, L. S.; Driscoll, C. L.; Wang, J. N.; Dai, H.; Widatalla, T. M.; Zhang, J. L.; Zhang, X.; Rao, B.; Feng, L.; Hie, B. L.; Gao, X. J.

2026-04-15 synthetic biology 10.1101/2025.09.19.677421 medRxiv
Top 3%
3.9%
Show abstract

Obtaining novel antibodies against specific protein targets is a widely important yet experimentally laborious process. Meanwhile, computational methods for antibody design have been limited by low success rates that currently require resource-intensive screening. Here, we introduce Germinal, a broadly enabling generative pipeline that designs antibodies against specific epitopes with nanomolar binding affinities while requiring only low-n experimental testing. Our method co-optimizes antibody structure and sequence by integrating a structure predictor with an antibody-specific protein language model to perform de novo design of functional complementarity-determining regions (CDRs) onto a user-specified structural framework. When tested against four diverse protein targets, Germinal successfully designed functional antibodies across all targets and binder formats, testing only 43-101 designs for each antigen. Validated designs also exhibited robust expression in mammalian cells and high sequence and structural novelty. We provide open-source code and full computational and experimental protocols to facilitate wide adoption. Germinal represents a milestone in efficient, epitope-targeted de novo antibody design, with notable implications for the development of molecular tools and therapeutics.

5
Deriving LD-adjusted GWAS summary statistics through linkage disequilibrium deconvolution

Nouira, A.; Favre Moiron, M.; Tournaire, M.; Verbanck, M.

2026-04-11 genetic and genomic medicine 10.64898/2026.04.10.26350574 medRxiv
Top 6%
1.9%
Show abstract

Genome-wide association studies (GWAS) have identified numerous genetic variants associated with complex traits. However, linkage disequilibrium (LD) confounds these associations, leading to false positives where non-causal variants appear associated because they are correlated with nearby causal variants. This is particularly the case in highly polygenic traits where the genome can be saturated in causal variants. To address this issue, we propose LDeconv a method based on truncated singular value decomposition (SVD) that adjust GWAS summary statistics without requiring individual-level genotype data. This approach accounts for LD structure, isolates causal variants in high-LD regions, and improve the reliability of effect size estimates. We assess its performance through simulations across various LD scenarios, conduct extensive sensitivity analyses, and apply them to real GWAS data from the UK Biobank. Our results demonstrate that LDeconv effectively reduces false discoveries while preserving true associations, offering a robust framework for post-GWAS analysis.

6
Loss of MITF activity leads to emergent cell states from the melanocyte stem cell lineage

Brombin, A.; MacMaster, S.; Travnickova, J.; Wyatt, C.; Brunsdon, H.; Ramsey, E.; Vu, H. N.; Steingrimsson, E.; Kenny, C.; Chandra, T.; Patton, E. E.

2026-04-12 developmental biology 10.64898/2025.12.23.695681 medRxiv
Top 7%
1.6%
Show abstract

How embryonic cells generate large clones of cells in the adult represents a fundamental question in biology. Here, using melanocyte stem cells (McSCs) in the zebrafish as a model, we explore the function of the master melanocyte transcription factor (MITF) in safeguarding McSCs in embryonic development and their potential to pigment large clones in the adult. MITF is well known is for its role in the specification of melanoblasts from the neural crest (NC) and their differentiation into melanocytes, yet little is known about how this activity shapes the stem cell lineages. Here, we use live imaging coupled with single-cell transcriptomics and lineage tracing to show that MITF (mitfa in zebrafish) protects the melanocyte stem cell (McSC) fate in zebrafish. Utilizing a temperature sensitive mitfavc7 mutant, we show loss of Mitfa leads to a surprising premature and aberrant expansion of McSC progeny at the niche during embryogenesis, coupled with novel emergent transcriptional cell states. Linage tracing of McSCs from the embryonic to juvenile stages reveals Mitfa activity is subsequently required in regeneration by Schwann cell-like and melanocyte stem cell progenitors that serve as a reservoir for fast-responding pigment progenitors. Thus, the impact of Mitfa loss on the melanocyte lineage is cell-state and stage-specific. The emergent cell states upon mitfa loss may have important implications for our understanding the loss of MITF activity in human genetic disease and melanoma.

7
Functional PD-1/PD-L1 engagement defines a spatial biomarker of immunotherapy response

Ullman, T.; Krantz, D.; Avenel, C.; Lung, M.; Svedman, F. C.; Holmsten, K.; Ostling, P.; Ullen, A.; Stadler, C.

2026-04-17 oncology 10.64898/2026.04.15.26350929 medRxiv
Top 9%
1.3%
Show abstract

Effective predictive biomarkers for immune checkpoint inhibitor (ICI) therapy remain an unmet need across solid tumors. Here, we present an integrated spatial proteomics workflow that combines in situ proximity ligation assay with multiplexed immunofluorescence to directly resolve PD1/PDL1 signaling events at the level of defined cellular phenotypes and their spatial organization within intact tumor tissue. Applied as a proof of concept to tumor samples from patients with metastatic urothelial carcinoma treated with pembrolizumab, this approach reveals that PD1/PDL1 interactions specifically involving cytotoxic CD8CD3 T cells are significantly enriched in complete responders, while such interactions are rare in patients with progressive disease. This interaction defined T cell subset achieves superior discrimination of clinical response compared to single marker PDL1 expression or immune cell abundance alone. By integrating direct detection of protein protein interactions with high dimensional single cell phenotyping, our workflow provides a mechanistically informed, spatially resolved biomarker of functional immune engagement. Beyond urothelial carcinoma, this platform establishes a generalizable framework for translating spatial signaling biology into predictive tools for immunotherapy response across tumor types.

8
SIEVE: Locus-Anchored Drug Prioritization for Complex Disorders

Strobl, E. V.

2026-04-17 pharmacology and therapeutics 10.64898/2026.04.15.26350958 medRxiv
Top 10%
1.0%
Show abstract

Motivation: Complex disorders arise from multiple genetic mechanisms, but most drug-prioritization methods treat each disorder as a single phenotype and therefore miss locus-specific therapeutic opportunities. Results: We present SIEVE, a framework that decomposes complex disorders into genetically localized subphenotypes and links GWAS summary statistics, reference expression, and perturbational transcriptional profiles to prioritize compounds that target locus-anchored disease mechanisms. SIEVE also constructs genetically calibrated mechanism vectors, projects away nonspecific expression programs using negative anchors, and aggregates evidence across cell lines, doses, and time points to produce robust drug rankings. Across simulations and analyses of real data, SIEVE improves compound prioritization relative to existing methods and shows that subphenotype-aware, genetics-guided modeling can sharpen therapeutic discovery in heterogeneous disorders. Availability and Implementation: R implementation: github.com/ericstrobl/SIEVE.

9
HAARF: Healthcare AI Agents Regulatory Framework - A Comprehensive Security Verification Standard for Autonomous AI Systems in Clinical Environments

Schwoebel, J.; Frasch, M.; Spalding, A.; Sewell, E.; Englert, P.; Halpert, B.; Overbay, C.; Semenec, I.; Shor, J.

2026-04-13 health systems and quality improvement 10.64898/2026.04.09.26350519 medRxiv
Top 11%
0.9%
Show abstract

As health systems begin deploying autonomous AI agents that make independent clinical decisions and take direct actions within care workflows, ensuring patient safety and care quality requires governance standards that go beyond existing medical device frameworks designed for human-in-the-loop prediction tools. This paper introduces the Healthcare AI Agents Regulatory Framework (HAARF), a comprehensive verification standard for autonomous AI systems in clinical environments, developed collaboratively with 40+ international experts spanning regulatory authorities, clinical organizations, and AI security specialists. HAARF synthesizes requirements from nine major regulatory frameworks (FDA, EU AI Act, Health Canada, UK MHRA, NIST AI RMF, WHO GI-AI4H, ISO/IEC 42001, OWASP AISVS, IMDRF GMLP) into eight core verification categories comprising 279 specific requirements across three risk-based implementation levels. The framework addresses critical gaps in health system readiness for autonomous AI including: (1) progressive autonomy governance with clinical accountability, (2) tool-use security for agents that independently access EHRs, medical devices, and clinical systems, (3) continuous equity monitoring and bias mitigation across diverse patient populations, and (4) clinical decision traceability preserving human oversight authority. We validate HAARFs enforcement capabilities through a scenario-based red-team evaluation comprising six adversarial scenarios executed under baseline (no middleware) and HAARF- guardrailed conditions (N = 50 trials each, Gemini 2.5 Flash primary with Claude Sonnet 4.6 cross-model validation). In baseline conditions, the agent model executes unauthorized tools in 56-60% of adversarial trials. Under the HAARF condition, deterministic middleware enforcement reduces the unauthorized-tool success rate to 0%, with 0% contraindication misses and 0% policy-injection success (95% Wilson CI [0.00, 0.07]). Cross-model validation confirms identical security metrics, supporting HAARFs model-agnostic design. Mapping analysis demonstrates 48-88% coverage of major regulatory frameworks, with per-category FDA alignment ranging from 73% (C5, Agent Registration) to 91% (C3, Cybersecurity; C7, Bias & Equity). Initial validation with healthcare organizations shows a 40-60% reduction in multi-jurisdictional compliance burden and improved clinical safety governance outcomes. HAARF provides health systems with a practical, risk-stratified pathway for safe AI agent deployment--shifting from reactive compliance to proactive quality governance while maintaining rigorous patient safety standards and human-centered care principles.

10
Human Oncogene EWS::FLI1 Functions as a Pioneer Factor in Saccharomyces cerevisiae.

Velazquez, D.; Molnar, C.; Reina, J.; Mora, J.; Gonzalez, C.

2026-04-14 cancer biology 10.1101/2025.10.22.680884 medRxiv
Top 11%
0.8%
Show abstract

Ewing sarcoma (EwS) is an aggressive, human-exclusive tumor typically driven by the EWS::FLI1 fusion protein. To assess whether the neomorphic functions of EWS::FLI1 are fundamentally dependent on evolutionarily recent cofactors such as ETS transcription factors (ETS-TFs), Plycomb group (PcG) proteins, CBP/p300, or specific subunits of the BAF complex, we expressed EWS::FLI1 in the model organism Saccharomyces cerevisiae. This minimal system was chosen because several key EWS::FLI 's cofactors possess greatly reduced sequence homology (e.g., BAF) or are lacking altogether (e.g., ETS-TFs, PcG, or CBP/p300). We used co-IP/MS to map the yeast interactome, Chip-Seq to identify gDNA binding sequences, RNA-Seq for global gene expression, and engineered reporters to test conversion of (GGAA) tandem repeats (GGAASat) into neoenhancers. We found that the yeast EWS::FLI1 interactome was more limited and qualitatively distinct from its human counterpart, sharing core machinery (e.g. RNA Polymerase II, FACT) but lacking the BAF/SWI-SNF and spliceosome complexes, and showing strong enrichment for the SAGA chromatin remodeling complex. We also found that EWS::FLI1 binds to hundreds of sites in the yeast genome with a clear preference for putative ETS-TF consensus sequences and (CA) dinucleotide repeats. Yet, EWS::FLI1 expressing cells presented only minimal transcriptional dysregulation, a stark contrast to the extensive changes observed in humans and Drosophila cells. Finally, we found that EWS::FLI1 successfully converted silent GGAASat sequences into active enhancers in yeast. This remarkable result occurs despite the absence of homologs for key human activators, such as CBP/p300, strongly suggesting that EWS::FLI1 can mobilize functionally related, non-homologous pathways to establish neoenhancers at GGAASat sites. Altogether, our results indicate that EWS::FLI1's core ability to drive GGAASat-dependent gene expression is a conserved, ancient property, while GGAASat-independent extensive transcriptome reprogramming is dependent on co-factors and pathways specific to animal cells.

11
Ad-verse Effects: Pharmaceutical Advertising Shifts Drug Recommendations by Consumer-Facing AI

Omar, M.; Agbareia, R.; McGreevy, J.; Zebrowski, A.; Ramaswamy, A.; Gorin, M.; Anato, E. M.; Glicksberg, B. S.; Sakhuja, A.; Charney, A.; Klang, E.; Nadkarni, G.

2026-04-16 health policy 10.64898/2026.04.14.26350868 medRxiv
Top 11%
0.8%
Show abstract

Large language models are increasingly used for clinical guidance while their parent companies introduce advertising. We tested whether pharmaceutical ads embedded in the prompts of 12 models from OpenAI, Anthropic, and Google shift drug recommendations across 258,660 API calls and four experiments probing distinct epistemic conditions. When two drugs were both guideline appropriate, advertising shifted selection of the advertised drug by +12.7 percentage points (P < 0.001), with some model scenario pairs shifting from 0% to 100%. Google models were the most susceptible (+29.8 pp), followed by OpenAI (+10.9 pp), while Anthropic models showed minimal change (+2.0 pp). When the advertised product lacked evidence or was clinically suboptimal, models resisted. This reveals a structured vulnerability: advertising does not override medical knowledge but fills the space where clinical evidence is underdetermined. An open response sub analysis (2,340 calls across three representative models) confirmed that advertising restructures free-text clinical reasoning: models echoed ad claims at 2.7 times the baseline rate while maintaining high stated confidence and rarely disclosing the ad. Susceptibility was provider dependent (Google: +29.8 pp; OpenAI: +10.9 pp; Anthropic: +2.0 pp). Because this bias operates within clinically correct answers, it is invisible to accuracy based evaluation, identifying a class of AI safety vulnerability that standard testing cannot detect.

12
Characterization of a pancreatic cancer GWAS signal suggests PDX1 buffers stress in the exocrine pancreas

Hoskins, J. W.; Christensen, T. A.; Eiser, D.; Char, E.; Mobaraki, M.; O'Brien, A.; Collins, I.; Zhong, J.; Patel, M. B.; Prasad, G.; Pancreatic Cancer Cohort Consortium and Pancreatic Cancer Case-Control Consortium (PanScan/PanC4), ; Arda, E.; Connelly, K. E.; Amundadottir, L. T.

2026-04-15 genetic and genomic medicine 10.64898/2026.04.13.26350790 medRxiv
Top 14%
0.6%
Show abstract

Pancreatic ductal adenocarcinoma (PDAC) remains one of the deadliest human cancers. The current largest published PDAC Genome-Wide Association Study (GWAS) identified 23 genetic risk signals, but most lack sufficient characterization. This study aimed to functionally characterize the chr13q12.2 (PLUT/PDX1) PDAC GWAS risk locus. Fine-mapping, luciferase reporter assays, and electrophoretic mobility shift assays implicated rs9581943, a PDX1 promoter SNP, as a functional variant underlying this GWAS signal. GTEx expression QTL analyses identified rs9581943 as a significant PDX1 eQTL in pancreas, and CRISPR/Cas9 editing in PDAC-derived cell lines confirmed a functional relationship. PDX1 is a transcription factor involved in early pancreas development and {beta}-cell homeostasis, but its role in exocrine pancreatic cells is unclear. Single-nucleus RNA-seq analyses of pancreatic acinar and ductal cells from neonatal, adult, and chronic pancreatitis donors suggested PDX1 activity alleviates high secretory load and ER-stress in acinar and biases ducts toward homeostatic phenotypes. Similarly, scRNA-seq analyses of pancreatic tumors suggested PDX1 activity reduces biosynthetic and inflammatory stress and promotes epithelial differentiation. Our study therefore implicates rs9581943 as a causal variant for the chr13q12.2 PDAC GWAS signal wherein the risk allele reduces PDX1 expression, eroding PDX1's capacity to buffer stress and stabilize epithelial cell fate in the exocrine compartment.

13
Shared inheritance reveals landscape of somatic and germline cancer risk in TP53

MacGregor, H. A. J.; Blundell, J. R.; Easton, D. F.

2026-04-11 genetic and genomic medicine 10.64898/2026.04.10.26350605 medRxiv
Top 14%
0.6%
Show abstract

Pathogenic variants in TP53, the key tumour-suppressor gene underlying Li-Fraumeni syndrome (LFS), are among the best-established causes of inherited cancer predisposition. However, large-scale sequencing has revealed that many apparently pathogenic TP53 variants detected in blood are the result of somatic clonal expansions, complicating risk interpretation. Using blood-derived whole-exome data from 469,391 UK Biobank participants, we combined variant allele fraction (VAF) with haplotype-sharing analysis to distinguish germline and somatic TP53 variants. Germline variants were concentrated at sites linked to partial loss of p53 function and lower disease penetrance, whereas classic LFS alleles appeared almost entirely somatic. High-VAF carriers of classic LFS alleles conferred markedly increased risk of haematological malignancy but not solid tumours, consistent with large TP53-mutant clonal expansions. The prevalence of somatic clonal expansion also correlated with missense variant pathogenicity, suggesting that somatic activity provides an informative in vivo proxy for functional impact. These results provide new insights into TP53-associated cancer risk at the population level, demonstrate that somatic rather than germline risk predominates in middle-aged healthy adults and provide a scalable framework for variant classification in large-scale population genomics.

14
Single-molecule cfDNA sequencing establishes clinical utility for ecDNA monitoring and multimodal liquid biopsy analysis

Sauer, C. M.; Tovey, N.; Ptasinska, A.; Hughes, D.; Stockton, J.; Zumalave, S.; Rust, A. G.; Lynn, C.; Livellara, V.; Sevrin, F.; Himsworth, C.; Muyas, F.; Nicolaidou, M.; Parry, G.; Paisana, E.; Cascao, R.; Ahmed, S. W.; Yasin, S. A.; Portela, L. R.; Balasubramanian, P.; Burke, G. A. A.; Vedi, A.; Faria, C. C.; Marshall, L. V.; Jacques, T. S.; Hubank, M.; Hargrave, D.; George, S.; Angelini, P.; Anderson, J.; Chesler, L.; Beggs, A. D.; Cortes-Ciriano, I.

2026-04-12 oncology 10.64898/2026.04.08.26350410 medRxiv
Top 14%
0.6%
Show abstract

Cell-free DNA (cfDNA) profiling enables minimally invasive cancer detection and monitoring. We present SIMMA, a low-input single-molecule sequencing approach that enables multimodal whole-genome and high-depth targeted sequencing of the same cfDNA sample for both tumour-agnostic and tumour-informed liquid biopsy analysis. Across 792 plasma and cerebrospinal fluid cfDNA samples from 277 paediatric patients with diverse brain and extracranial tumours, SIMMA enabled tumour diagnosis, detection of driver mutations, and reconstruction of extrachromosomal DNA (ecDNA) months before clinical relapse. Using conformal prediction trained on genome-wide fragmentomics, genomic and epigenomic data, SIMMA predicts disease burden as a continuous variable and provides well-calibrated uncertainty estimates for each sample, achieving a limit of detection of [~]100 ppm from low-pass whole-genome sequencing data. In summary, SIMMA establishes the clinical utility of multimodal cfDNA profiling with uncertainty quantification for individual patients and unlocks the potential of ecDNA as a liquid biopsy biomarker for disease detection and monitoring across diverse aggressive malignancies.

15
SCOPE: Integrating Organoid Screening and Clinical Variables Through Machine Learning for Cancer Trial Outcome Prediction

Bouteiller, J.; Gryspeert, A.-R.; Caron, J.; Polit, L.; Altay, G.; Cabantous, M.; Pietrzak, R.; Graziosi, F.; Longarini, M.; Schutte, K.; Cartry, J.; Mathieu, J. R.; Bedja, S.; Boileve, A.; Ducreux, M.; Pages, D.-L.; Jaulin, F.; Ronteix, G.

2026-04-11 oncology 10.64898/2026.04.10.26350512 medRxiv
Top 14%
0.6%
Show abstract

Background: Predicting whether a treatment will demonstrate meaningful clinical benefit before committing to a large-scale trial remains a major unmet need in oncology. Patient-derived organoids (PDOs) recapitulate individual tumor drug sensitivity, but have not been used to forecast population-level trial outcomes. We developed SCOPE (Screening-to-Clinical Outcome Prediction Engine), a platform that integrates PDO drug screening with clinical prognostic modeling to predict arm-level median progression-free survival (mPFS) and objective response rate (ORR) without access to any trial outcome data. Patients and methods: SCOPE was trained on 54 treatment lines from patients with metastatic colorectal cancer (mCRC, n=15) and metastatic pancreatic ductal adenocarcinoma (mPDAC, n=39) with matched clinical data and PDO drug screening across 9 compounds. A Clinical Score module captures baseline prognosis; a Drug Screen Score module quantifies treatment-specific organoid sensitivity. To predict trial outcomes, synthetic patient profiles are generated from published eligibility criteria and matched to a biobank of 81 PDO lines. Predictions were externally validated against 32 arms from 23 published trials, treatment ranking was assessed across 8 head-to-head comparisons, and prospective applicability was tested for daraxonrasib (RMC-6236), a novel pan-RAS inhibitor in mPDAC. Results: Predicted mPFS strongly agreed with published outcomes (R2=0.85, MAE=0.82 months; Pearson r=0.92, P<0.001), approaching the empirical concordance between two independently measured clinical endpoints (ORR vs. mPFS, R2=0.87). ORR prediction was similarly robust (R2=0.71, MAE=7.3 percentage points). Integrating organoid and clinical data significantly outperformed either alone (P=0.001). SCOPE correctly identified the superior arm in 7 of 8 head-to-head comparisons (88%, P<0.05). Applied to daraxonrasib prior to phase 3 data availability, the platform predicted superiority over standard chemotherapy in KRAS-mutant mPDAC, consistent with emerging clinical data. Conclusion: By combining functional organoid drug screening with clinical modeling, SCOPE generates calibrated efficacy predictions for both established regimens and novel agents without prior clinical data. This approach could support clinical trial design, treatment arm selection, and go/no-go decisions, offering a new tool to improve the efficiency of gastrointestinal cancer drug development.

16
Fine-Tuning PubMedBERT for Hierarchical Condition Category Classification

Wang, X.; Hammarlund, N.; Prosperi, M.; Zhu, Y.; Revere, L.

2026-04-15 health systems and quality improvement 10.64898/2026.04.13.26350814 medRxiv
Top 15%
0.5%
Show abstract

Automating Hierarchical Condition Category (HCC) assignment directly from unstructured electronic health record (EHR) notes remains an important but understudied problem in clinical informatics. We present HCC-Coder, an end to end NLP system that maps narrative documentation to 115 Centers for Medicare & Medicaid Services(CMS) HCC codes in a multi-label setting. On the test dataset, HCC-Coder achieves a macro-F1 of 0.779 and a micro-F1 of 0.756, with a macro-sensitivity of 0.819 and macro-specificity of 0.998. By contrast, Generative Pre-trained Transformer (GPT)-4o achieves highest score of a macro-F1 of 0.735 and a micro-F1 of 0.708 under five-shot prompting. The fine-tuned model demonstrates consistent absolute improvements of 4%-5% in F1-scores over GPT-4o. To address severe label imbalance, we incorporate inverse-frequency weighting and per-label threshold calibration. These findings suggest that domain-adapted transformers provide more balanced and reliable performance than prompt-based large language models for hierarchical clinical coding and risk adjustment.

17
SPLIT: Safety Prioritization for Long COVID Drug Repurposing via a Causal Integrated Targeting Framework

Pinero, S. L.; Li, X.; Lee, S. H.; Liu, L.; Li, J.; Le, T. D.

2026-04-16 health informatics 10.64898/2026.04.12.26350701 medRxiv
Top 15%
0.5%
Show abstract

Long COVID affects millions of people worldwide, yet no disease-modifying treatment has been approved, and existing interventions have shown only modest and inconsistent benefits. A key reason for this limited progress is that current computational drug repurposing pipelines do not match well with the clinical reality of Long COVID. These patients often have persistent, multisystemic symptoms and may already be taking multiple medications, making treatment safety a primary concern. However, most repurposing workflows still treat safety as a downstream filter and rely on disease-associated targets rather than causal drivers. They also assume that the findings of one analysis would generalize across the diverse presentations of Long COVID. We introduce SPLIT, a safety-first repurposing framework that addresses these limitations. SPLIT prioritizes safety at the start of the candidate evaluation, integrates complementary causal inference strategies to identify likely driver genes, and uses a counterfactual substitution design to compare drugs within specific cohort contexts. When applied to cognitive and respiratory Long COVID cohorts, SPLIT revealed three main findings. First, drugs with similar predicted efficacy could have very different predicted safety profiles. Second, the drugs flagged as unfavorable were often different between the two cohorts, showing that drug prioritization is phenotype-specific. Third, SPLIT flagged 18 drugs currently under active investigation in Long COVID trials as having unfavorable predicted profiles. SPLIT provides a practical framework to identify safer, more context-appropriate candidates earlier in the process, supporting more targeted and better-tolerated treatment strategies for Long COVID.

18
Mutation timing, accumulation and selection in the male germline shape inheritance risk for developmental disorders

Neville, M. D. C.; Neuser, S.; Sanghvi, R.; Christopher, J.; Roberts, K.; Smith, K.; ONeill, L.; Hayes, J.; Cagan, A.; Hurles, M. E.; Goriely, A.; Abou Jamra, R.; Rahbari, R.

2026-04-13 genetic and genomic medicine 10.64898/2026.04.09.26350474 medRxiv
Top 15%
0.5%
Show abstract

De novo mutations (DNMs) arising in the parental germline are a major cause of severe developmental disorders. While most DNMs originate in the paternal germline, it remains unclear whether fathers of affected children carry a systematically altered burden of transmissible germline risk, or whether disease largely reflects stochastic outcomes of shared population-wide mutational processes. Here, we combined whole-genome sequencing of 168 parent-child trios with ultra-accurate duplex sequencing of paternal sperm to directly relate transmitted DNMs to the broader mutational and selective landscape of the male germline. In 127 fathers, sperm mutation burden and mutational spectra were indistinguishable from population reference cohorts. Positive selection metrics were likewise concordant, with a global dN/dS of 1.56 (95% CI 1.45-1.67) compared to 1.44 (95% CI 1.17-1.77) in controls and 28 of 32 significantly selected genes overlapping with prior findings. Six fathers harboured a pathogenic early mosaic variant detectable in sperm at allele fractions that ranged from 0.7% to 14.8%. Although these variants generated substantial individual-level risk outliers, they accounted for only [~]11% of the aggregated exome pathogenic burden across the cohort. The remaining burden was distributed across low-VAF mutations, including positively selected driver variants and other rare mutations accumulating with paternal age. Together, these results show that transmissible de novo disease risk is governed primarily by universal germline mutational and selective processes, while early developmental mosaicism produces uncommon but clinically meaningful deviations. This integrated view clarifies how mutation timing, age-associated accumulation and germline selection jointly shape inheritance risk.

19
Colibactin-associated mutations in the human colon appear to reflect anatomy and early exposure, not oncogenesis

Hiatt, L.; Peterson, E. V.; Happ, H. C.; Major-Mincer, J.; Avvaru, A.; Goclowski, C. L.; Garretson, A.; Sasani, T. A.; Hotaling, J. M.; Neklason, D. W.; Uchida, A. M.; Quinlan, A. R.

2026-04-15 genetic and genomic medicine 10.64898/2026.04.13.26350783 medRxiv
Top 16%
0.4%
Show abstract

Colorectal cancer (CRC) is the second leading cause of cancer death globally and the number one cause of cancer death in people under 50 years old. The reasons for the rise of early-onset CRC are unknown, and while anatomically distinct subtypes of CRC have substantial clinical and molecular associations, the etiology of region-specific disease, such as early-onset CRC's enrichment in the distal colon, remains unclear. Understanding regional mutagenesis may identify risk factors for this public health concern and CRC more broadly. To evaluate mutational dynamics across the premalignant colon, we performed whole-genome sequencing of 125 individual colon crypts taken from six standardized regions biopsied during colonoscopy, collected from 11 donors without polyps and 10 with polyps. We observed mutation spectra and accumulation rates consistent with previous whole-organ studies, with greater subclonal mutation capture enabled by experimental design. T>[A,C,G] mutations, which are associated with colibactin genotoxicity from pks+ Escherichia coli, were significantly enriched in the rectum of donors with and without polyps (adjusted p-values < 0.01). Moreover, when comparing findings to crypts from individuals with CRC and sequenced CRC tumors, we observed consistent enrichment of the colibactin-associated mutational signature "ID18" in the rectum in both normal colon crypts and CRC tumors, without significant difference in colibactin-specific single nucleotide variant or insertion-deletion burden in crypts across the three clinical groups (i.e., no polyp, polyp, and CRC). These findings argue against a causal or prognostic role for colibactin in CRC, instead indicating that the proposed association with early-onset disease reflects anatomic specificity rather than cancer-specific clinical relevance.

20
Generational gains in memory capacity and stability may account for declining dementia incidence rates in Europe and the United States

Fjell, A. M. M.; Grodem, E. O. S. O. S.; Lunansky, G.; Vidal-Pineiro, D.; Rogeberg, O. J.; Walhovd, K. B.

2026-04-15 neurology 10.64898/2026.04.14.26350835 medRxiv
Top 17%
0.4%
Show abstract

Dementia incidence has been declining in Western societies for decades, but whether this reflects higher cognitive capacity entering old age, slower cognitive decline, or both remains unresolved. Analysing ~783,000 episodic memory assessments from ~219,000 individuals across five longitudinal cohorts, we find that later-born cohorts benefit from a double dividend: higher memory levels entering old age and slower rates of decline. The projected 20-year cohort advantage at age 80 is of sufficient magnitude to plausibly account for the observed 13% per-decade decline in dementia incidence reported in meta-analyses. Generational gains are disproportionately concentrated among the fastest-declining individuals, and are reflected in lower hippocampal atrophy rates in an independent sample. A formal bounding analysis shows that the double dividend is robust across a range of plausible period assumptions, consistent with environmental conditions operating across the lifespan having reshaped the architecture of human cognitive aging.